ENH: Add future.python_scalars #63016

rhshadrach · 2025-11-06T22:22:37Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Adds an experimental option to return Python scalars instead of NumPy scalars across the API. This is not yet fully implemented everywhere, e.g. Series.__getitem__, but I'm hoping reductions are a substantial chunk.

This is complicated by #62988 where it was found that many of our doctests are not running. We run those doctests using NumPy>=2, and if we were to get those doctests to pass as-is, we would need to change the NumPy reprs from e.g. 2 to np.int64(2). If we then change reductions et al to returning Python scalars, we'd then change all the reprs back from e.g. np.int64(2) to 2. So instead I think we can:

Merge this experimental option, not yet advertising it to users.
Merge (after some work) DOC: Run all doctests #62988 where we run doctests with the experimental option enabled. This would reduce churn in the documentation.
Finish work on this option, expose to users in pandas 3.x and start deprecation process for changing the default.
Change default of future.python_scalars to True in 4.0, deprecate the future option.

rhshadrach · 2025-11-06T23:24:41Z

cc @jbrockmendel @mroeschke @jorisvandenbossche

jbrockmendel · 2025-11-07T18:05:07Z

Perf impact?

possible xref #13468, #23106, #29738, #20791, #21256

rhshadrach · 2025-11-07T18:33:09Z

Perf impact?

Plan to run a full set of ASVs next week, some microbenchmarks

from pandas.core.dtypes.cast import maybe_unbox_numpy_scalar

with pd.option_context("python_scalars", True):
    %timeit maybe_unbox_numpy_scalar(np.int64(2))
    # 828 ns ± 9.91 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
    %timeit maybe_unbox_numpy_scalar(2)
    # 161 ns ± 0.414 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

ser = pd.Series([1, 2, 3] * 10_000)
with pd.option_context("python_scalars", True):
    %timeit ser.sum()
    # 9.42 μs ± 423 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
with pd.option_context("python_scalars", False):
    %timeit ser.sum()
    # 8.28 μs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

pandas/core/dtypes/cast.py

rhshadrach · 2025-11-13T22:09:29Z

Full ASVs are below, only showing where there was a 10% of more regression. In the full list, only the following two actually hit the function maybe_unbox_numpy_scalar.

| Change   | Before [c3bace88] <main>   | After [0896a2f6] <enh_python_scalars>   |   Ratio | Benchmark (Parameter)                                                                                        |
|----------|----------------------------|-----------------------------------------|---------|--------------------------------------------------------------------------------------------------------------|
| +        | 4.20±0.01μs                | 4.77±0.3μs                              |    1.14 | series_methods.NanOps.time_func('max', 1000, 'float64')                                                      |
| +        | 4.19±0.01μs                | 4.85±0.4μs                              |    1.16 | series_methods.NanOps.time_func('max', 1000, 'int32')                                                        |

Full list

| Change   | Before [c3bace88] <main>   | After [0896a2f6] <enh_python_scalars>   |   Ratio | Benchmark (Parameter)                                                                                        |
|----------|----------------------------|-----------------------------------------|---------|--------------------------------------------------------------------------------------------------------------|
| +        | 14.0±0.09μs                | 15.7±0.9μs                              |    1.12 | arithmetic.CategoricalComparisons.time_categorical_op('__ge__')                                              |  
| +        | 440±10μs                   | 580±20μs                                |    1.32 | arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function add>) |  
| +        | 751±6μs                    | 839±10μs                                |    1.12 | arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<BusinessDay>)                                          |  
| +        | 290±10μs                   | 325±2μs                                 |    1.12 | categoricals.Concat.time_concat                                                                              |  
| +        | 13.3±0.08ms                | 16.1±0.6ms                              |    1.21 | frame_methods.Fillna.time_fillna(True, 'Float64')                                                            |  
| +        | 182±3μs                    | 211±1μs                                 |    1.16 | frame_methods.MemoryUsage.time_memory_usage                                                                  |
| +        | 797±6μs                    | 906±2μs                                 |    1.14 | frame_methods.NSort.time_nlargest_one_column('last')                                                         |  
| +        | 989±8μs                    | 1.11±0.01ms                             |    1.12 | frame_methods.NSort.time_nsmallest_one_column('all')                                                         |  
| +        | 991±2μs                    | 1.10±0ms                                |    1.11 | frame_methods.NSort.time_nsmallest_one_column('first')                                                       |  
| +        | 1.25±0ms                   | 1.38±0ms                                |    1.1  | frame_methods.NSort.time_nsmallest_two_columns('all')                                                        |  
| +        | 1.26±0.01ms                | 1.39±0ms                                |    1.1  | frame_methods.NSort.time_nsmallest_two_columns('first')                                                      |  
| +        | 11.6±0.2ms                 | 15.1±0.2ms                              |    1.3  | groupby.TransformEngine.time_series_cython(False)                                                            |
| +        | 11.7±0.4ms                 | 14.9±0.09ms                             |    1.28 | groupby.TransformEngine.time_series_cython(True)                                                             |
| +        | 691±50ns                   | 812±50ns                                |    1.18 | index_cached_properties.IndexCache.time_inferred_type('TimedeltaIndex')                                      |
| +        | 34.1±0.6ms                 | 39.1±0.1ms                              |    1.15 | io.csv.ReadCSVCategorical.time_convert_post('c')                                                             |
| +        | 164±1μs                    | 232±1μs                                 |    1.41 | join_merge.Concat.time_concat_mixed_ndims(1)                                                                 |
| +        | 4.20±0.01μs                | 4.77±0.3μs                              |    1.14 | series_methods.NanOps.time_func('max', 1000, 'float64')                                                      |
| +        | 4.19±0.01μs                | 4.85±0.4μs                              |    1.16 | series_methods.NanOps.time_func('max', 1000, 'int32')                                                        |
| +        | 2.16±0.02ms                | 4.35±2ms                                |    2.01 | stat_ops.FrameOps.time_op('kurt', 'float', None)                                                             |
| +        | 2.25±0.02ms                | 5.28±2ms                                |    2.34 | stat_ops.FrameOps.time_op('skew', 'float', None)                                                             |

I was curious why only min/max showed up as being regressions in series_methods.NanOps. Here is the full output.

series_methods.NanOps

|          | n/a                        | n/a                                     | n/a     | series_methods.NanOps.time_func('argmax', 1000, 'Int64')                                                                                                                 |
|          | n/a                        | n/a                                     | n/a     | series_methods.NanOps.time_func('argmax', 1000, 'boolean')                                                                                                               |
|          | 4.47±0.03μs                | 4.51±0.05μs                             | 1.01    | series_methods.NanOps.time_func('argmax', 1000, 'float64')                                                                                                               |
|          | 1.09±0μs                   | 1.11±0.01μs                             | 1.02    | series_methods.NanOps.time_func('argmax', 1000, 'int32')                                                                                                                 |
|          | 1.15±0μs                   | 1.15±0μs                                | 1.00    | series_methods.NanOps.time_func('argmax', 1000, 'int64')                                                                                                                 |
|          | 1.12±0μs                   | 1.12±0μs                                | 1.00    | series_methods.NanOps.time_func('argmax', 1000, 'int8')                                                                                                                  |
|          | n/a                        | n/a                                     | n/a     | series_methods.NanOps.time_func('argmax', 1000000, 'Int64')                                                                                                              |
|          | n/a                        | n/a                                     | n/a     | series_methods.NanOps.time_func('argmax', 1000000, 'boolean')                                                                                                            |
|          | 243±1μs                    | 244±4μs                                 | 1.01    | series_methods.NanOps.time_func('argmax', 1000000, 'float64')                                                                                                            |
|          | 59.0±0.2μs                 | 59.5±0.2μs                              | 1.01    | series_methods.NanOps.time_func('argmax', 1000000, 'int32')                                                                                                              |
|          | 116±0.3μs                  | 117±0.4μs                               | 1.01    | series_methods.NanOps.time_func('argmax', 1000000, 'int64')                                                                                                              |
|          | 17.7±0.06μs                | 17.9±0.06μs                             | 1.01    | series_methods.NanOps.time_func('argmax', 1000000, 'int8')                                                                                                               |
|          | 27.9±0.2μs                 | 29.1±0.2μs                              | 1.04    | series_methods.NanOps.time_func('kurt', 1000, 'Int64')                                                                                                                   |
|          | 27.9±0.06μs                | 29.2±0.4μs                              | 1.05    | series_methods.NanOps.time_func('kurt', 1000, 'boolean')                                                                                                                 |
|          | 27.8±0.3μs                 | 28.3±0.2μs                              | 1.02    | series_methods.NanOps.time_func('kurt', 1000, 'float64')                                                                                                                 |
|          | 26.4±0.2μs                 | 27.0±0.2μs                              | 1.02    | series_methods.NanOps.time_func('kurt', 1000, 'int32')                                                                                                                   |
|          | 27.6±1μs                   | 27.6±0.3μs                              | 1.00    | series_methods.NanOps.time_func('kurt', 1000, 'int64')                                                                                                                   |
|          | 26.5±0.08μs                | 27.1±0.2μs                              | 1.02    | series_methods.NanOps.time_func('kurt', 1000, 'int8')                                                                                                                    |
|          | 7.60±0.2ms                 | 7.49±0.4ms                              | 0.99    | series_methods.NanOps.time_func('kurt', 1000000, 'Int64')                                                                                                                |
|          | 7.75±0.09ms                | 7.42±0.08ms                             | 0.96    | series_methods.NanOps.time_func('kurt', 1000000, 'boolean')                                                                                                              |
|          | 8.24±0.3ms                 | 7.60±0.3ms                              | 0.92    | series_methods.NanOps.time_func('kurt', 1000000, 'float64')                                                                                                              |
|          | 7.07±0.2ms                 | 6.25±0.3ms                              | ~0.88   | series_methods.NanOps.time_func('kurt', 1000000, 'int32')                                                                                                                |
|          | 6.62±0.1ms                 | 6.30±0.08ms                             | 0.95    | series_methods.NanOps.time_func('kurt', 1000000, 'int64')                                                                                                                |
|          | 6.48±0.06ms                | 6.22±0.1ms                              | 0.96    | series_methods.NanOps.time_func('kurt', 1000000, 'int8')                                                                                                                 |
|          | 5.86±0.01μs                | 5.94±0.1μs                              | 1.01    | series_methods.NanOps.time_func('max', 1000, 'Int64')                                                                                                                    |
|          | 5.17±0.05μs                | 5.34±0.07μs                             | 1.03    | series_methods.NanOps.time_func('max', 1000, 'boolean')                                                                                                                  |
| +        | 4.20±0.01μs                | 4.77±0.3μs                              | 1.14    | series_methods.NanOps.time_func('max', 1000, 'float64')                                                                                                                  |
| +        | 4.19±0.01μs                | 4.85±0.4μs                              | 1.16    | series_methods.NanOps.time_func('max', 1000, 'int32')                                                                                                                    |
|          | 4.21±0.02μs                | 4.40±0.02μs                             | 1.05    | series_methods.NanOps.time_func('max', 1000, 'int64')                                                                                                                    |
|          | 9.03±0.02μs                | 9.32±0.06μs                             | 1.03    | series_methods.NanOps.time_func('max', 1000, 'int8')                                                                                                                     |
|          | 513±20μs                   | 512±20μs                                | 1.00    | series_methods.NanOps.time_func('max', 1000000, 'Int64')                                                                                                                 |
|          | 295±3μs                    | 305±9μs                                 | 1.04    | series_methods.NanOps.time_func('max', 1000000, 'boolean')                                                                                                               |
|          | 419±0.8μs                  | 422±2μs                                 | 1.01    | series_methods.NanOps.time_func('max', 1000000, 'float64')                                                                                                               |
|          | 417±1μs                    | 418±1μs                                 | 1.00    | series_methods.NanOps.time_func('max', 1000000, 'int32')                                                                                                                 |
|          | 419±1μs                    | 417±0.8μs                               | 1.00    | series_methods.NanOps.time_func('max', 1000000, 'int64')                                                                                                                 |
|          | 33.3±0.2μs                 | 33.6±0.2μs                              | 1.01    | series_methods.NanOps.time_func('max', 1000000, 'int8')                                                                                                                  |
|          | 14.4±0.1μs                 | 14.5±0.05μs                             | 1.01    | series_methods.NanOps.time_func('mean', 1000, 'Int64')                                                                                                                   |
|          | 14.2±0.1μs                 | 14.5±0.04μs                             | 1.02    | series_methods.NanOps.time_func('mean', 1000, 'boolean')                                                                                                                 |
|          | 8.39±0.06μs                | 8.79±0.05μs                             | 1.05    | series_methods.NanOps.time_func('mean', 1000, 'float64')                                                                                                                 |
|          | 7.99±0.09μs                | 8.35±0.05μs                             | 1.05    | series_methods.NanOps.time_func('mean', 1000, 'int32')                                                                                                                   |
|          | 8.23±0.04μs                | 8.49±0.03μs                             | 1.03    | series_methods.NanOps.time_func('mean', 1000, 'int64')                                                                                                                   |
|          | 7.83±0.03μs                | 8.29±0.1μs                              | 1.06    | series_methods.NanOps.time_func('mean', 1000, 'int8')                                                                                                                    |
|          | 822±2μs                    | 820±1μs                                 | 1.00    | series_methods.NanOps.time_func('mean', 1000000, 'Int64')                                                                                                                |
|          | 741±3μs                    | 745±2μs                                 | 1.00    | series_methods.NanOps.time_func('mean', 1000000, 'boolean')                                                                                                              |
|          | 464±1μs                    | 469±3μs                                 | 1.01    | series_methods.NanOps.time_func('mean', 1000000, 'float64')                                                                                                              |
|          | 255±0.3μs                  | 256±0.7μs                               | 1.00    | series_methods.NanOps.time_func('mean', 1000000, 'int32')                                                                                                                |
|          | 357±0.6μs                  | 359±1μs                                 | 1.00    | series_methods.NanOps.time_func('mean', 1000000, 'int64')                                                                                                                |
|          | 253±0.6μs                  | 253±1μs                                 | 1.00    | series_methods.NanOps.time_func('mean', 1000000, 'int8')                                                                                                                 |
|          | 37.8±0.8μs                 | 39.8±2μs                                | 1.05    | series_methods.NanOps.time_func('median', 1000, 'Int64')                                                                                                                 |
|          | 37.3±0.3μs                 | 37.2±0.3μs                              | 1.00    | series_methods.NanOps.time_func('median', 1000, 'boolean')                                                                                                               |
|          | 5.98±0.01μs                | 6.11±0.04μs                             | 1.02    | series_methods.NanOps.time_func('median', 1000, 'float64')                                                                                                               |
|          | 5.60±0.03μs                | 5.64±0.02μs                             | 1.01    | series_methods.NanOps.time_func('median', 1000, 'int32')                                                                                                                 |
|          | 5.41±0.02μs                | 5.56±0.01μs                             | 1.03    | series_methods.NanOps.time_func('median', 1000, 'int64')                                                                                                                 |
|          | 21.0±0.09μs                | 21.0±0.3μs                              | 1.00    | series_methods.NanOps.time_func('median', 1000, 'int8')                                                                                                                  |
|          | 3.85±0.6ms                 | 4.37±0.7ms                              | ~1.14   | series_methods.NanOps.time_func('median', 1000000, 'Int64')                                                                                                              |
|          | 4.84±0.3ms                 | 5.03±0.3ms                              | 1.04    | series_methods.NanOps.time_func('median', 1000000, 'boolean')                                                                                                            |
|          | 2.08±0ms                   | 2.08±0.01ms                             | 1.00    | series_methods.NanOps.time_func('median', 1000000, 'float64')                                                                                                            |
|          | 1.65±0ms                   | 1.65±0ms                                | 1.00    | series_methods.NanOps.time_func('median', 1000000, 'int32')                                                                                                              |
|          | 1.46±0ms                   | 1.46±0ms                                | 1.00    | series_methods.NanOps.time_func('median', 1000000, 'int64')                                                                                                              |
|          | 831±3μs                    | 829±1μs                                 | 1.00    | series_methods.NanOps.time_func('median', 1000000, 'int8')                                                                                                               |
|          | 6.05±0.2μs                 | 5.87±0.06μs                             | 0.97    | series_methods.NanOps.time_func('min', 1000, 'Int64')                                                                                                                    |
|          | 5.14±0.02μs                | 5.34±0.01μs                             | 1.04    | series_methods.NanOps.time_func('min', 1000, 'boolean')                                                                                                                  |
|          | 4.22±0.02μs                | 4.41±0.02μs                             | 1.04    | series_methods.NanOps.time_func('min', 1000, 'float64')                                                                                                                  |
|          | 4.16±0.03μs                | 4.40±0.01μs                             | 1.06    | series_methods.NanOps.time_func('min', 1000, 'int32')                                                                                                                    |
|          | 4.21±0.01μs                | 4.37±0.02μs                             | 1.04    | series_methods.NanOps.time_func('min', 1000, 'int64')                                                                                                                    |
|          | 9.05±0.07μs                | 9.21±0.05μs                             | 1.02    | series_methods.NanOps.time_func('min', 1000, 'int8')                                                                                                                     |
|          | 530±20μs                   | 499±10μs                                | 0.94    | series_methods.NanOps.time_func('min', 1000000, 'Int64')                                                                                                                 |
|          | 330±10μs                   | 307±9μs                                 | 0.93    | series_methods.NanOps.time_func('min', 1000000, 'boolean')                                                                                                               |
|          | 421±1μs                    | 420±1μs                                 | 1.00    | series_methods.NanOps.time_func('min', 1000000, 'float64')                                                                                                               |
|          | 419±1μs                    | 417±0.6μs                               | 0.99    | series_methods.NanOps.time_func('min', 1000000, 'int32')                                                                                                                 |
|          | 421±0.6μs                  | 418±0.6μs                               | 0.99    | series_methods.NanOps.time_func('min', 1000000, 'int64')                                                                                                                 |
|          | 33.9±0.2μs                 | 34.3±0.2μs                              | 1.01    | series_methods.NanOps.time_func('min', 1000000, 'int8')                                                                                                                  |
|          | 6.15±0.03μs                | 6.37±0.02μs                             | 1.03    | series_methods.NanOps.time_func('prod', 1000, 'Int64')                                                                                                                   |
|          | 6.45±0.01μs                | 6.84±0.2μs                              | 1.06    | series_methods.NanOps.time_func('prod', 1000, 'boolean')                                                                                                                 |
|          | 6.46±0.03μs                | 6.65±0.03μs                             | 1.03    | series_methods.NanOps.time_func('prod', 1000, 'float64')                                                                                                                 |
|          | 4.78±0.02μs                | 4.94±0.03μs                             | 1.03    | series_methods.NanOps.time_func('prod', 1000, 'int32')                                                                                                                   |
|          | 4.36±0.01μs                | 4.60±0.05μs                             | 1.05    | series_methods.NanOps.time_func('prod', 1000, 'int64')                                                                                                                   |
|          | 4.73±0.03μs                | 4.94±0.08μs                             | 1.04    | series_methods.NanOps.time_func('prod', 1000, 'int8')                                                                                                                    |
|          | 857±2μs                    | 861±4μs                                 | 1.00    | series_methods.NanOps.time_func('prod', 1000000, 'Int64')                                                                                                                |
|          | 967±3μs                    | 963±2μs                                 | 1.00    | series_methods.NanOps.time_func('prod', 1000000, 'boolean')                                                                                                              |
|          | 883±10μs                   | 877±5μs                                 | 0.99    | series_methods.NanOps.time_func('prod', 1000000, 'float64')                                                                                                              |
|          | 732±3μs                    | 725±0.9μs                               | 0.99    | series_methods.NanOps.time_func('prod', 1000000, 'int32')                                                                                                                |
|          | 618±1μs                    | 621±0.9μs                               | 1.00    | series_methods.NanOps.time_func('prod', 1000000, 'int64')                                                                                                                |
|          | 722±2μs                    | 727±1μs                                 | 1.01    | series_methods.NanOps.time_func('prod', 1000000, 'int8')                                                                                                                 |
|          | 33.7±0.3μs                 | 34.7±0.4μs                              | 1.03    | series_methods.NanOps.time_func('sem', 1000, 'Int64')                                                                                                                    |
|          | 32.6±0.2μs                 | 33.4±0.2μs                              | 1.02    | series_methods.NanOps.time_func('sem', 1000, 'boolean')                                                                                                                  |
|          | 26.2±0.2μs                 | 27.3±0.1μs                              | 1.04    | series_methods.NanOps.time_func('sem', 1000, 'float64')                                                                                                                  |
|          | 18.9±0.08μs                | 19.5±0.07μs                             | 1.03    | series_methods.NanOps.time_func('sem', 1000, 'int32')                                                                                                                    |
|          | 19.2±0.02μs                | 19.3±0.05μs                             | 1.00    | series_methods.NanOps.time_func('sem', 1000, 'int64')                                                                                                                    |
|          | 44.4±0.2μs                 | 45.5±0.2μs                              | 1.02    | series_methods.NanOps.time_func('sem', 1000, 'int8')                                                                                                                     |
|          | 4.76±0.8ms                 | 5.19±0.3ms                              | 1.09    | series_methods.NanOps.time_func('sem', 1000000, 'Int64')                                                                                                                 |
|          | 5.13±0.06ms                | 5.46±0.05ms                             | 1.07    | series_methods.NanOps.time_func('sem', 1000000, 'boolean')                                                                                                               |
|          | 3.40±0.6ms                 | 3.36±0.6ms                              | 0.99    | series_methods.NanOps.time_func('sem', 1000000, 'float64')                                                                                                               |
|          | 2.62±0.01ms                | 2.61±0.01ms                             | 1.00    | series_methods.NanOps.time_func('sem', 1000000, 'int32')                                                                                                                 |
|          | 2.69±0.01ms                | 2.68±0ms                                | 1.00    | series_methods.NanOps.time_func('sem', 1000000, 'int64')                                                                                                                 |
|          | 2.19±0ms                   | 2.18±0ms                                | 1.00    | series_methods.NanOps.time_func('sem', 1000000, 'int8')                                                                                                                  |
|          | 28.9±0.3μs                 | 29.4±0.4μs                              | 1.02    | series_methods.NanOps.time_func('skew', 1000, 'Int64')                                                                                                                   |
|          | 28.5±0.2μs                 | 30.1±0.5μs                              | 1.06    | series_methods.NanOps.time_func('skew', 1000, 'boolean')                                                                                                                 |
|          | 28.9±0.09μs                | 30.1±0.3μs                              | 1.04    | series_methods.NanOps.time_func('skew', 1000, 'float64')                                                                                                                 |
|          | 27.3±0.1μs                 | 28.4±0.4μs                              | 1.04    | series_methods.NanOps.time_func('skew', 1000, 'int32')                                                                                                                   |
|          | 27.5±0.1μs                 | 28.3±0.3μs                              | 1.03    | series_methods.NanOps.time_func('skew', 1000, 'int64')                                                                                                                   |
|          | 27.5±0.2μs                 | 27.7±0.3μs                              | 1.01    | series_methods.NanOps.time_func('skew', 1000, 'int8')                                                                                                                    |
|          | 7.62±0.3ms                 | 7.36±0.3ms                              | 0.97    | series_methods.NanOps.time_func('skew', 1000000, 'Int64')                                                                                                                |
|          | 7.91±0.06ms                | 7.43±0.09ms                             | 0.94    | series_methods.NanOps.time_func('skew', 1000000, 'boolean')                                                                                                              |
|          | 8.39±0.3ms                 | 7.90±0.3ms                              | 0.94    | series_methods.NanOps.time_func('skew', 1000000, 'float64')                                                                                                              |
| -        | 7.25±0.1ms                 | 6.43±0.3ms                              | 0.89    | series_methods.NanOps.time_func('skew', 1000000, 'int32')                                                                                                                |
|          | 6.68±0.1ms                 | 6.49±0.2ms                              | 0.97    | series_methods.NanOps.time_func('skew', 1000000, 'int64')                                                                                                                |
|          | 6.78±0.06ms                | 6.47±0.07ms                             | 0.95    | series_methods.NanOps.time_func('skew', 1000000, 'int8')                                                                                                                 |
|          | 33.1±0.1μs                 | 33.4±0.3μs                              | 1.01    | series_methods.NanOps.time_func('std', 1000, 'Int64')                                                                                                                    |
|          | 34.1±0.2μs                 | 36.0±0.7μs                              | 1.05    | series_methods.NanOps.time_func('std', 1000, 'boolean')                                                                                                                  |
|          | 5.31±0.04μs                | 5.48±0.02μs                             | 1.03    | series_methods.NanOps.time_func('std', 1000, 'float64')                                                                                                                  |
|          | 5.29±0.02μs                | 5.49±0.03μs                             | 1.04    | series_methods.NanOps.time_func('std', 1000, 'int32')                                                                                                                    |
|          | 5.33±0.09μs                | 5.50±0.01μs                             | 1.03    | series_methods.NanOps.time_func('std', 1000, 'int64')                                                                                                                    |
|          | 26.0±0.2μs                 | 25.9±0.2μs                              | 1.00    | series_methods.NanOps.time_func('std', 1000, 'int8')                                                                                                                     |
|          | 1.73±0.08ms                | 1.74±0.04ms                             | 1.00    | series_methods.NanOps.time_func('std', 1000000, 'Int64')                                                                                                                 |
|          | 2.31±0.6ms                 | 2.42±0.8ms                              | 1.05    | series_methods.NanOps.time_func('std', 1000000, 'boolean')                                                                                                               |
|          | 1.24±0.01ms                | 1.23±0.01ms                             | 0.99    | series_methods.NanOps.time_func('std', 1000000, 'float64')                                                                                                               |
|          | 1.23±0.01ms                | 1.23±0ms                                | 1.00    | series_methods.NanOps.time_func('std', 1000000, 'int32')                                                                                                                 |
|          | 1.24±0.01ms                | 1.23±0ms                                | 0.99    | series_methods.NanOps.time_func('std', 1000000, 'int64')                                                                                                                 |
|          | 817±2μs                    | 815±1μs                                 | 1.00    | series_methods.NanOps.time_func('std', 1000000, 'int8')                                                                                                                  |
|          | 5.73±0.02μs                | 5.84±0.09μs                             | 1.02    | series_methods.NanOps.time_func('sum', 1000, 'Int64')                                                                                                                    |
|          | 6.06±0.02μs                | 6.16±0.04μs                             | 1.02    | series_methods.NanOps.time_func('sum', 1000, 'boolean')                                                                                                                  |
|          | 7.25±0.04μs                | 7.56±0.03μs                             | 1.04    | series_methods.NanOps.time_func('sum', 1000, 'float64')                                                                                                                  |
|          | 5.12±0.02μs                | 5.33±0.02μs                             | 1.04    | series_methods.NanOps.time_func('sum', 1000, 'int32')                                                                                                                    |
|          | 4.84±0.05μs                | 4.99±0.02μs                             | 1.03    | series_methods.NanOps.time_func('sum', 1000, 'int64')                                                                                                                    |
|          | 5.17±0.03μs                | 5.26±0.01μs                             | 1.02    | series_methods.NanOps.time_func('sum', 1000, 'int8')                                                                                                                     |
|          | 357±2μs                    | 355±1μs                                 | 0.99    | series_methods.NanOps.time_func('sum', 1000000, 'Int64')                                                                                                                 |
|          | 458±2μs                    | 456±1μs                                 | 1.00    | series_methods.NanOps.time_func('sum', 1000000, 'boolean')                                                                                                               |
|          | 260±10μs                   | 251±3μs                                 | 0.97    | series_methods.NanOps.time_func('sum', 1000000, 'float64')                                                                                                               |
|          | 230±0.8μs                  | 231±1μs                                 | 1.00    | series_methods.NanOps.time_func('sum', 1000000, 'int32')                                                                                                                 |
|          | 119±0.4μs                  | 120±0.9μs                               | 1.01    | series_methods.NanOps.time_func('sum', 1000000, 'int64')                                                                                                                 |
|          | 222±0.5μs                  | 234±5μs                                 | 1.06    | series_methods.NanOps.time_func('sum', 1000000, 'int8')                                                                                                                  |
|          | 31.2±0.09μs                | 32.3±0.06μs                             | 1.04    | series_methods.NanOps.time_func('var', 1000, 'Int64')                                                                                                                    |
|          | 32.2±0.4μs                 | 34.9±1μs                                | 1.08    | series_methods.NanOps.time_func('var', 1000, 'boolean')                                                                                                                  |
|          | 6.20±0.02μs                | 6.40±0.07μs                             | 1.03    | series_methods.NanOps.time_func('var', 1000, 'float64')                                                                                                                  |
|          | 6.13±0.03μs                | 6.36±0.05μs                             | 1.04    | series_methods.NanOps.time_func('var', 1000, 'int32')                                                                                                                    |
|          | 6.20±0.02μs                | 6.46±0.04μs                             | 1.04    | series_methods.NanOps.time_func('var', 1000, 'int64')                                                                                                                    |
|          | 27.3±0.09μs                | 27.2±0.4μs                              | 1.00    | series_methods.NanOps.time_func('var', 1000, 'int8')                                                                                                                     |
|          | 1.72±0.09ms                | 1.73±0.04ms                             | 1.01    | series_methods.NanOps.time_func('var', 1000000, 'Int64')                                                                                                                 |
|          | 2.30±0.6ms                 | 2.39±0.8ms                              | 1.04    | series_methods.NanOps.time_func('var', 1000000, 'boolean')                                                                                                               |
|          | 1.23±0ms                   | 1.23±0ms                                | 1.00    | series_methods.NanOps.time_func('var', 1000000, 'float64')                                                                                                               |
|          | 1.23±0ms                   | 1.22±0ms                                | 1.00    | series_methods.NanOps.time_func('var', 1000000, 'int32')                                                                                                                 |
|          | 1.23±0ms                   | 1.23±0ms                                | 1.00    | series_methods.NanOps.time_func('var', 1000000, 'int64')                                                                                                                 |
|          | 820±4μs                    | 814±1μs                                 | 0.99    | series_methods.NanOps.time_func('var', 1000000, 'int8')

…python_scalars

rhshadrach · 2025-11-16T15:50:41Z

@jbrockmendel - you good with the ASVs here?

jbrockmendel · 2025-11-16T18:14:44Z

No complaints here.

jbrockmendel · 2025-11-21T16:35:49Z

pandas/core/arrays/masked.py

            else:
-                result = result.reshape(1)
+                if using_python_scalars():
+                    result = np.array([result])


why doing this instead of maybe_unbox_numpy_scalar?

The result here prior to L1543 is already a Python scalar when future.python_scalars=True due to calling the reduction function. In this block, keepdims=True so we need to convert it to a NumPy array.

jbrockmendel · 2025-11-21T16:38:42Z

pandas/core/dtypes/cast.py

+        if isinstance(result, np.longdouble):
+            result = float(result)
+        else:
+            result = value.item()


I know this will mess up on a timedelta64:

obj = np.timedelta64(1, "ns") assert isinstance(obj, np.generic) >>> obj.item() 1

I don't know if there are other cases where obj.item() messes up, but I'm wary of it. Heads up.

Thanks - will add a test for all dtypes. Here is the full list of scalars and their corresponding item type without datetime/timedelta. Only other problematic one is complex256.

? bool <class 'bool'> b int8 <class 'int'> h int16 <class 'int'> i int32 <class 'int'> l int64 <class 'int'> q int64 <class 'int'> n int64 <class 'int'> p int64 <class 'int'> B uint8 <class 'int'> H uint16 <class 'int'> I uint32 <class 'int'> L uint64 <class 'int'> Q uint64 <class 'int'> N uint64 <class 'int'> P uint64 <class 'int'> e float16 <class 'float'> f float32 <class 'float'> d float64 <class 'float'> g float128 <class 'numpy.longdouble'> F complex64 <class 'complex'> D complex128 <class 'complex'> G complex256 <class 'numpy.clongdouble'> S |S1 <class 'bytes'> U <U1 <class 'str'> V |V0 <class 'bytes'>

For datetime/timedelta, the arrow dtypes already return pd.Timedelta and pd.Timestamp. I think we can align the NumPy dtypes to do the same.

jbrockmendel · 2025-11-21T16:41:07Z

pandas/tests/arrays/floating/test_function.py

    result = getattr(df.C, op)()
-    assert isinstance(result, np.float64)
+    if using_python_scalars:
+        assert isinstance(result, float)


i think float64 subclasses float, so this won't exclude float64

jbrockmendel · 2025-11-21T16:42:50Z

pandas/tests/reductions/test_reductions.py

+        if using_python_scalars:
+            assert isinstance(result, int)
+        else:
+            assert result.dtype == "uint64"


just check type rather than dtype?

jbrockmendel · 2025-11-21T16:43:03Z

pandas/tests/reductions/test_stat_reductions.py

                assert 0 == s.skew()
-                assert isinstance(s.skew(), np.float64)  # GH53482
+                if using_python_scalars:
+                    assert isinstance(s.skew(), float)


won't exclude float64

jbrockmendel · 2025-11-21T16:43:49Z

pandas/tests/series/test_ufunc.py

            tm.assert_series_equal(result, expected)
        else:
            expected = values[1]
+            if using_python_scalars and values.dtype.kind in ["i", "f"]:


NBD but checking for kind in "if" is very slightly faster than checking for kind in ["i", "f"].

jbrockmendel · 2025-11-21T16:44:59Z

pandas/tests/window/moments/test_moments_consistency_ewm.py

+    if isinstance(all_data, Series):
+        assert not (std_x < 0).any()
+    else:
+        assert not (std_x < 0).any().any()


i guess np.bool_(True).any() returns itself? thats kind of convenient. could go in the list joris asked for of potential downsides

For this particular case, I'd say axis=None should be preferred. But perhaps there are others where that is useful.

jbrockmendel · 2025-11-21T16:45:25Z

pandas/tests/window/moments/test_moments_consistency_expanding.py

+    if isinstance(all_data, Series):
+        assert not (var_x < 0).any()
+    else:
+        assert not (var_x < 0).any().any()


if this pattern is going to show up a lot, could make a helper in pd._testing?

Changed to axis=None, no branching.

rhshadrach added 3 commits November 6, 2025 17:02

ENH: Add future.python_scalars

477cc4f

Indicate config is experimental

bd953a2

Add CI job

0896a2f

rhshadrach mentioned this pull request Nov 10, 2025

DOC: Series.sum() has examples that don't illustrate the actual results #62966

Open

1 task

Dr-Irv reviewed Nov 10, 2025

View reviewed changes

pandas/core/dtypes/cast.py Show resolved Hide resolved

rhshadrach marked this pull request as ready for review November 13, 2025 22:11

rhshadrach requested a review from mroeschke as a code owner November 13, 2025 22:11

rhshadrach added 2 commits November 13, 2025 17:16

Type-hint fixes

64de4aa

Merge branch 'main' of https://github.com/pandas-dev/pandas into enh_…

1de3610

…python_scalars

rhshadrach added the Enhancement label Nov 13, 2025

Merge branch 'main' into enh_python_scalars

fef7950

rhshadrach requested a review from jbrockmendel November 19, 2025 21:51

Merge branch 'main' into enh_python_scalars

919536d

rhshadrach mentioned this pull request Nov 20, 2025

ENH: Unbox NumPy scalars in indexing #63165

Draft

6 tasks

jbrockmendel reviewed Nov 21, 2025

View reviewed changes

rhshadrach added 2 commits November 21, 2025 16:44

Refinements

bac5764

Rename test file

4902463

Uh oh!

ENH: Add future.python_scalars #63016

Are you sure you want to change the base?

ENH: Add future.python_scalars #63016

Conversation

rhshadrach commented Nov 6, 2025

Uh oh!

rhshadrach commented Nov 6, 2025

Uh oh!

jbrockmendel commented Nov 7, 2025

Uh oh!

rhshadrach commented Nov 7, 2025

Uh oh!

Uh oh!

rhshadrach commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhshadrach commented Nov 16, 2025

Uh oh!

jbrockmendel commented Nov 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrach Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rhshadrach commented Nov 13, 2025 •

edited

Loading

rhshadrach Nov 21, 2025 •

edited

Loading